Search CORE

349 research outputs found

Accurate modeling of confounding variation in eQTL studies leads to a great increase in power to detect trans-regulatory effects

Author: Neil Lawrence
Nicolo Fusi
Oliver Stegle
Publication venue
Publication date: 02/06/2011
Field of study

Expression quantitative trait loci (eQTL) studies are an integral tool to investigate the genetic component of gene expression variation. A major challenge in the analysis of such studies are hidden confounding factors, such as unobserved covariates or unknown environmental influences. These factors can induce a pronounced artifactual correlation structure in the expression profiles, which may create spurious false associations or mask real genetic association signals. 

Here, we report PANAMA (Probabilistic ANAlysis of genoMic dAta), a novel probabilistic model to account for confounding factors within an
eQTL analysis. In contrast to previous methods, PANAMA learns hidden factors jointly with the effect of prominent genetic regulators. As a result, PANAMA can more accurately distinguish between true genetic association signals and confounding variation. 

We applied our model and compared it to existing methods on a variety of datasets and biological systems. PANAMA consistently performs better than alternative methods, and finds in particular substantially more trans regulators. Importantly, PANAMA not only identified a greater number of associations, but also yields hits that are biologically more plausible and can be better reproduced between independent studies

Nature Precedings

Statistical Tests for Detecting Differential RNA-Transcript Expression from Read Counts

Author: Gunnar Rätsch
Karsten Borgwardt
Oliver Stegle
Philipp Drewe
Philipp Drewe
Regina Bohnert
Publication venue
Publication date: 01/01/2010
Field of study

As a fruit of the current revolution in sequencing technology, transcriptomes can now be analyzed at an unprecedented level of detail. These advances have been exploited for detecting differential expressed genes across biological samples and for quantifying the abundances of various RNA transcripts within one gene. However, explicit strategies for detecting the hidden differential abundances of RNA transcripts in biological samples have not been defined. In this work, we present two novel statistical tests to address this issue: a 'gene structure sensitive' Poisson test for detecting differential expression when the transcript structure of the gene is known, and a kernel-based test called Maximum Mean Discrepancy when it is unknown. We analyzed the proposed approaches on simulated read data for two artificial samples as well as on factual reads generated by the Illumina Genome Analyzer for two _C. elegans_ samples. Our analysis shows that the Poisson test identifies genes with differential transcript expression considerably better that previously proposed RNA transcript quantification approaches for this task. The MMD test is able to detect a large fraction (75%) of such differential cases without the knowledge of the annotated transcripts. It is therefore well-suited to analyze RNA-Seq experiments when the genome annotations are incomplete or not available, where other approaches have to fail

Crossref

Nature Precedings

MPG.PuRe

Recommended from our members

SpatialDE: identification of spatially variable genes.

Author: Stegle Oliver
Svensson Valentine
Teichmann Sarah A
Publication venue: Nature Methods
Publication date: 01/05/2018
Field of study

Technological advances have made it possible to measure spatially resolved gene expression at high throughput. However, methods to analyze these data are not established. Here we describe SpatialDE, a statistical test to identify genes with spatial patterns of expression variation from multiplexed imaging or spatial RNA-sequencing data. SpatialDE also implements 'automatic expression histology', a spatial gene-clustering approach that enables expression-based tissue histology

Apollo (Cambridge)

Warped linear mixed models for the genetic analysis of transformed phenotypes.

Author: Fusi Nicolo
Lawrence Neil D
Lippert Christoph
Stegle Oliver
Publication venue: Nat Commun
Publication date: 19/09/2014
Field of study

Linear mixed models (LMMs) are a powerful and established tool for studying genotype-phenotype relationships. A limitation of the LMM is that the model assumes Gaussian distributed residuals, a requirement that rarely holds in practice. Violations of this assumption can lead to false conclusions and loss in power. To mitigate this problem, it is common practice to pre-process the phenotypic values to make them as Gaussian as possible, for instance by applying logarithmic or other nonlinear transformations. Unfortunately, different phenotypes require different transformations, and choosing an appropriate transformation is challenging and subjective. Here we present an extension of the LMM that estimates an optimal transformation from the observed data. In simulations and applications to real data from human, mouse and yeast, we show that using transformations inferred by our model increases power in genome-wide association studies and increases the accuracy of heritability estimation and phenotype prediction

PubMed Central

Apollo (Cambridge)

MDC Repository

White Rose Research Online

Detecting low-complexity unobserved causes

Author: Janzing Dominik
Peters Jonas
Schoelkopf Bernhard
Sgouritsa Eleni
Stegle Oliver
Publication venue
Publication date: 01/01/2011
Field of study

We describe a method that infers whether statistical dependences between two observed variables X and Y are due to a "direct" causal link or only due to a connecting causal path that contains an unobserved variable of low complexity, e.g., a binary variable. This problem is motivated by statistical genetics. Given a genetic marker that is correlated with a phenotype of interest, we want to detect whether this marker is causal or it only correlates with a causal one. Our method is based on the analysis of the location of the conditional distributions P(Y|x) in the simplex of all distributions of Y. We report encouraging results on semi-empirical data

arXiv.org e-Print Archive

CiteSeerX

MPG.PuRe

f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq.

Author: Buettner Florian
Marioni John C
McCarthy Davis J
Pratanwanich Naruemon
Stegle Oliver
Publication venue: Genome Biol
Publication date: 01/01/2017
Field of study

Single-cell RNA-sequencing (scRNA-seq) allows studying heterogeneity in gene expression in large cell populations. Such heterogeneity can arise due to technical or biological factors, making decomposing sources of variation difficult. We here describe f-scLVM (factorial single-cell latent variable model), a method based on factor analysis that uses pathway annotations to guide the inference of interpretable factors underpinning the heterogeneity. Our model jointly estimates the relevance of individual factors, refines gene set annotations, and infers factors without annotation. In applications to multiple scRNA-seq datasets, we find that f-scLVM robustly decomposes scRNA-seq datasets into interpretable components, thereby facilitating the identification of novel subpopulations

Directory of Open Access Journals

PuSH

Apollo (Cambridge)

University of Melbourne Institutional Repository

FigShare

Genomdaten FAIR und sicher teilen: Das Deutsche Humangenom-Phänom Archiv (GHGA) als Baustein der Nationalen Forschungsdateninfrastruktur

Author: Eva Winkler
Jan Eufinger
Jan Korbel
Oliver Kohlbacher
Oliver Stegle
Publication venue: Gemeinsame Arbeitsgruppe Forschungsdaten der Deutschen Initiative für Netzwerkinformationen e.V. (DINI) und von nestor - Deutsches Kompetenznetzwerk zur digitalen Langzeitarchivierung
Publication date: 01/07/2021
Field of study

Menschliche Genomdaten und andere verwandte Omics-Daten, die mithilfe moderner Sequenzierverfahren gewonnen werden, sind integraler Bestandteil der biomedizinischen Forschung. In Zukunft werden diese Daten auch die klinische Versorgung immer stärker prägen. Dabei muss das Bedürfnis, Daten offen und FAIR für die Forschung nutzen zu können immer mit dem Schutz der Privatsphäre der Patientinnen und Patienten ausbalanciert und gegeneinander abgewogen werden. Zugriff kann dabei nur unter Einhaltung der notwendigen technischen und organisatorischen Schutzmaßnahmen und für legitime Forschungszwecke gewährt werden. Auf europäischer Ebene gibt es für diesen Zweck bereits das Europäische Genom-Phänom-Archiv (EGA). Da die zentrale EGA Infrastruktur die spezifischen nationalen Regelungen zum Datenschutz nur ungenügend abbilden kann, ist eine Umwandlung in eine föderierte Infrastruktur aus nationalen Knoten (“föderiertes EGA”) geplant. Ziel des NFDI-Projektes GHGA ist der Aufbau eines Genomarchivs als nationaler EGA-Knoten für die sichere Speicherung, den Zugriff und die Analyse menschlicher Omics-Daten (z.B. Genome, Transkriptome) in einem einheitlichen ethisch-rechtlichen Rahmen. GHGA wird dabei auch die Wünsche der Forschungsgemeinde nach effizienten, benutzerfreundlichen Analysen im großen Maßstab und zur Replikation von Ergebnissen auf anderen Kohorten berücksichtigen. GHGA setzt dabei auf existierenden nationalen Omics-Datenlieferanten und deren IT-Infrastrukturen auf, um eine harmonisierte, interoperable Infrastruktur zu schaffen. Ziel ist es, Forschende in Deutschland in die Lage zu versetzen, humane Genomdaten rechtssicher entsprechend der FAIR-Richtlinien auszutauschen und dabei internationale Standards zum Datenaustausch stärker mitzugestalten. GHGA ist dabei eingebunden in flankierende internationale Forschungsnetzwerke wie etwa die europäische 1+ Million Genomes Initiative

Directory of Open Access Journals